Spotify Exploratory Dataset Analysis

Introduction

Code and Documentation

Version control:

  • Git

  • Github

  • Consistent file structure and naming (e.g., 0-dataMunging)

Github page that includes all the files

Clear and properly commented code

Background

  • The data was squired through Spotify API in 2020 by TidyTuesday

  • The class of the data frame

[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame" 
  • The number of rows
[1] 32833

The data set

  • The variables (columns)
 [1] "track_id"                 "track_name"              
 [3] "track_artist"             "track_popularity"        
 [5] "track_album_id"           "track_album_name"        
 [7] "track_album_release_date" "playlist_name"           
 [9] "playlist_id"              "playlist_genre"          
[11] "playlist_subgenre"        "danceability"            
[13] "energy"                   "key"                     
[15] "loudness"                 "mode"                    
[17] "speechiness"              "acousticness"            
[19] "instrumentalness"         "liveness"                
[21] "valence"                  "tempo"                   
[23] "duration_ms"             

The different Genres

  • The main genres in the data

  edm latin   pop   r&b   rap  rock 
 6043  5155  5507  5431  5746  4951 

Cleaning: NA values

  • Check for the NA observation
# A tibble: 5 × 4
  track_name track_artist track_album_name track_id              
  <chr>      <chr>        <chr>            <chr>                 
1 <NA>       <NA>         <NA>             69gRFGOWY9OMpFJgFol1u0
2 <NA>       <NA>         <NA>             5cjecvX0CmC9gK0Laf5EMQ
3 <NA>       <NA>         <NA>             5TTzhRSWQS4Yu8xTgAuq6D
4 <NA>       <NA>         <NA>             3VKFip3OdAvv4OfNTgFWeQ
5 <NA>       <NA>         <NA>             69gRFGOWY9OMpFJgFol1u0
  • After investigating, these are unique songs

Cleaning: Duplicates

  • Songs that have the same name
[1] 9383
  • Songs that have the same ID
[1] 4477

Clean data

  • The new data with no duplicates

    
      edm latin   pop   r&b   rap  rock 
     4877  4137  5132  4504  5401  4305 

Exploratory Data Analysis

Initial Data Exploration

  • Most popular artists

Initial Data Exploration

  • Most popular albums

Initial Data Exploration

  • Most popular genres

Exploratory Data Analysis

How features change within genres

Latin

Latin stands out in danceability and valence.

  • danceability

Latin

  • Valance

Release Date

  • Is there a relationship between album release date and popularity?
  • Are features affected by album release date?

Correlation

What’s the correlation between energy, loudness and acousticness?

  • Positive correlation between energy and loudness

  • Negative correlation between energy and acousticness

Correlation

  • Does this correlation exist in a specific genre?

Correlation

  • Does this correlation exist in a specific genre?

Track duration

Does the track duration affect the popularity of the song?

Conclusion

The analysis of the Spotify dataset yielded the following results:

  • Pop and Latin are the top most popular genres.

  • The higher the danceability/ valence, the more positively it correlates to the popularity.

  • Energy and loudness are positively correlated.

  • Energy and acoustics are negativelly correlated.

  • Track duration does not have a clear effect on the track popularity